Reliability Modelling of Whole RAID Storage Subsystems
نویسنده
چکیده
Reliability modelling of RAID storage systems with its various components such as RAID controllers, enclosures, expanders, interconnects and disks is important from a storage system designer's point of view. A model that can express all the failure characteristics of the whole RAID storage system can be used to evaluate design choices, perform cost reliability trade-o s and conduct sensitivity analyses. We present a reliability model for RAID storage systems where we try to model all the components as accurately as possible. We use several state-space reduction techniques, such as aggregating all in-series components and hierarchical decomposition, to reduce the size of our model. To automate computation of reliability, we use the PRISM model checker as a CTMC solver where appropriate. Initially, we assume a simple 3-state disk reliability model with independent disk failures. Later, we assume a Weibull model for the disks; we also consider a correlated disk failure model to check correspondence with the eld data available. For all other components in the system, we assume exponential failure distribution. To use the CTMC solver, we approximate the Weibull distribution for a disk using sum of exponentials and we rst con rm that this model gives results that are in reasonably good agreement with those from the sequential Monte Carlo simulation methods for RAID disk subsystems. Next, our model for whole RAID storage systems (that includes, for example, disks, expanders, enclosures) uses Weibull distributions and, where appropriate, correlated failure modes for disks, and exponential distributions with independent failure modes
منابع مشابه
Scalable Reliability Modelling of RAID Storage Subsystems
Reliability modelling of RAID storage systems with its various components such as RAID controllers, enclosures, expanders, interconnects and disks is important from a storage system designer’s point of view. A model that can express all the failure characteristics of the whole RAID storage system can be used to evaluate design choices, perform cost reliability trade-offs and conduct sensitivity...
متن کاملECKD CAID and RAID: When is the Right Time to Write?
Traditional dual copy, CAID, and RAID DASD subsystems can offer improved data reliability in cases of actuator and/or media failures. However, these schemes impose a write penalty for the extra I/Os required to maintain image copies or parity information for data contained in the subsystem. In this paper, we will employ a review of the 3990-3/6 read and write data flows as a basis for discussin...
متن کاملRAID0.5: Active Data Replication for Low Cost Disk Array Data Protection
RAID has long been established as an effective way to provide highly reliable as well as high-performance disk subsystems. However, reliability in RAID systems comes at the cost of extra disks. In this paper, we describe a mechanism that we have termed RAID0.5 that enables striped disks with very high data reliability but low disk cost. We take advantage of the fact that most disk systems use b...
متن کاملSCAN: An Efficient Sector Failure Recovery Algorithm for RAID-6 Codes
Recent studies show disks fail much more often in real systems than specified in their data-sheets and RAID-5 may not be able to provide needed reliability for practical systems. It is desirable to have disk arrays and clustered storage systems with higher data redundancy, such as RAID-6. Meanwhile, latest research also indicates disk sector failures occur much more often than whole disk failur...
متن کاملReliability Models for Highly Fault-tolerant Storage Systems
We found that a reliability model commonly used to estimate Mean-Time-To-Data-Loss (MTTDL), while suitable for modeling RAID 0 and RAID 5, fails to accurately model systems having a fault-tolerance greater than 1. Therefore, to model the reliability of RAID 6, Triple-Replication, or k-of-n systems requires an alternate technique. In this paper, we explore some alternatives, and evaluate their e...
متن کامل